Goto

Collaborating Authors

 physiological data




StressID: a Multimodal Dataset for Stress Identification

Neural Information Processing Systems

Total size 5.29GB Physiological total duration across subjects and across tasks 1119 min Video total duration across subjects and across tasks 918 min Audio total duration across subjects and across tasks 385 minFigure 1: A dataset summary card for StressID, constructed based on [2, 5]. 3 Figure 2: Organisation of the


A Unified AI Approach for Continuous Monitoring of Human Health and Diseases from Intensive Care Unit to Home with Physiological Foundation Models (UNIPHY+)

Wang, Minxiao, Kataria, Saurabh, Ni, Juntong, Buchman, Timothy G., Grunwell, Jocelyn, Mai, Mark, Jin, Wei, Clark, Matthew, Brown, Stephanie, Fundora, Michael, Sharma, Puneet, Pan, Tony, Khan, Sam, Ruchti, Timothy, Muthu, Naveen, Maher, Kevin, Bhavani, Sivasubramanium V, Hu, Xiao

arXiv.org Artificial Intelligence

We present UNIPHY+, a unified physiological foundation model (physioFM) framework designed to enable continuous human health and diseases monitoring across care settings using ubiquitously obtainable physiological data. We propose novel strategies for incorporating contextual information during pretraining, fine-tuning, and lightweight model personalization via multi-modal learning, feature fusion-tuning, and knowledge distillation. We advocate testing UNIPHY+ with a broad set of use cases from intensive care to ambulatory monitoring in order to demonstrate that UNIPHY+ can empower generalizable, scalable, and personalized physiological AI to support both clinical decision-making and long-term health monitoring.


Distinguishing Startle from Surprise Events Based on Physiological Signals

Sharma, Mansi, Duchevet, Alexandre, Daiber, Florian, Imbert, Jean-Paul, Rekrut, Maurice

arXiv.org Artificial Intelligence

Unexpected events can impair attention and delay decision-making, posing serious safety risks in high-risk environments such as aviation. In particular, reactions like startle and surprise can impact pilot performance in different ways, yet are often hard to distinguish in practice. Existing research has largely studied these reactions separately, with limited focus on their combined effects or how to differentiate them using physiological data. In this work, we address this gap by distinguishing between startle and surprise events based on physiological signals using machine learning and multi-modal fusion strategies. Our results demonstrate that these events can be reliably predicted, achieving a highest mean accuracy of 85.7% with SVM and Late Fusion. To further validate the robustness of our model, we extended the evaluation to include a baseline condition, successfully differentiating between Startle, Surprise, and Baseline states with a highest mean accuracy of 74.9% with XGBoost and Late Fusion.


ASLSL: Adaptive shared latent structure learning with incomplete multi-modal physiological data for multi-dimensional emotional feature selection

Xu, Xueyuan, Yu, Tianze, Dong, Wenjia, Wei, Fulin, Zhuo, Li

arXiv.org Artificial Intelligence

Recently, multi-modal physiological signals based emotion recognition has garnered increasing attention in the field of brain-computer interfaces. Nevertheness, the associated multi-modal physiological features are often high-dimensional and inevitably include irrelevant, redundant, and noisy representation, which can easily lead to overfitting, poor performance, and high computational complexity in emotion classifiers. Feature selection has been widely applied to address these challenges. However, previous studies generally assumed that multi-modal physiological data are complete, whereas in reality, the data are often incomplete due to the openness of the acquisition and operational environment. For example, a part of samples are available in several modalities but not in others. To address this issue, we propose a novel method for incomplete multi-modal physiological signal feature selection called adaptive shared latent structure learning (ASLSL). Based on the property that similar features share similar emotional labels, ASLSL employs adaptive shared latent structure learning to explore a common latent space shared for incomplete multi-modal physiological signals and multi-dimensional emotional labels, thereby mitigating the impact of missing information and mining consensus information. Two most popular multi-modal physiological emotion datasets (DEAP and DREAMER) with multi-dimensional emotional labels were utilized to compare the performance between compare ASLSL and seventeen feature selection methods. Comprehensive experimental results on these datasets demonstrate the effectiveness of ASLSL.


UniPhyNet: A Unified Network For Multimodal Physiological Raw Signal Classification

Qiu, Renxiang, Selvan, Raghavendra

arXiv.org Machine Learning

We present UniPhyNet, a novel neural network architecture to classify cognitive load using multimodal physiological data -- specifically EEG, ECG and EDA signals -- without the explicit need for extracting hand-crafted features. UniPhyNet integrates multiscale parallel convolutional blocks and ResNet-type blocks enhanced with channel block attention module to focus on the informative features while a bidirectional gated recurrent unit is used to capture temporal dependencies. This architecture processes and combines signals in both unimodal and multimodal configurations via intermediate fusion of learned feature maps. On the CL-Drive dataset, UniPhyNet improves raw signal classification accuracy from 70% to 80% (binary) and 62% to 74% (ternary), outperforming feature-based models, demonstrating its effectiveness as an end-to-end solution for real-world cognitive state monitoring.


Transformer representation learning is necessary for dynamic multi-modal physiological data on small-cohort patients

Wang, Bingxu, Ge, Min, Cai, Kunzhi, Zhang, Yuqi, Zhou, Zeyi, Li, Wenjiao, Guo, Yachong, Wang, Wei, Zhou, Qing

arXiv.org Artificial Intelligence

Transformer representation learning is necessary for dynamic multi-modal physiological data on small-cohort patients Bingxu Wang, Min Ge, Kunzhi Cai, Yuqi Zhang, Zeyi Zhou, Wenjiao Li, Yachong Guo,, Wei Wang,, and Qing Zhou, Department of Thoracic and Cardiovascular Surgery, The Affiliated Drum Tower Hospital of Nanjing University Medical School, Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China National Laboratory of Solid State Microstructure, Department of Physics, Nanjing University, Nanjing 210093, China E-mail: yguo@nju.edu.cn; Abstract Postoperative delirium (POD), a severe neuropsychiatric complication affecting nearly 50% of high-risk surgical patients, is defined as an acute disorder of attention and cognition, It remains significantly underdiagnosed in the intensive care units (ICUs) due to subjective monitoring methods. Early and accurate diagnosis of POD is critical and achievable. Here, we propose a POD prediction framework comprising a Transformer representation model followed by traditional machine learning algorithms. We curated the first multi-modal POD dataset encompass-1 ing two patient types and evaluated the various Transformer architectures for representation learning. Empirical results indicate a consistent improvements of sensitivity and Youden index in patient TYPE I using Transformer representations, particularly our fusion adaptation of Pathformer. By enabling effective delirium diagnosis from postoperative day 1 to 3, our extensive experimental findings emphasize the potential of multi-modal physiological data and highlight the necessity of representation learning via multi-modal Transformer architecture in clinical diagnosis. Introduction Postoperative delirium(POD), a prevalent acute neuropsychiatric syndrome 1,2, affects more than 50% of surgical patients and significantly elevates morbidity and mortality risks 3 . Early identification is crucial yet challenging 4, primarily due to subjective assessment criteria and incomplete understanding of underlying pathophysiological mechanisms 5 .


An Attentive Dual-Encoder Framework Leveraging Multimodal Visual and Semantic Information for Automatic OSAHS Diagnosis

Wei, Yingchen, Qiu, Xihe, Tan, Xiaoyu, Huang, Jingjing, Chu, Wei, Xu, Yinghui, Qi, Yuan

arXiv.org Artificial Intelligence

Obstructive sleep apnea-hypopnea syndrome (OSAHS) [1] Our key contributions are as follows: (1) Introducing VTA-affects about 27% of adults [2], causing poor sleep, daytime OSAHS, a multimodal framework for diagnosing OSAHS dysfunction, and higher risks of cardiovascular diseases and diabetes severity by combining visual and language data, and using [3]. The standard diagnostic method, polysomnography a pre-trained language model to extract key information from (PSG) [4], is complex, costly, and uncomfortable, requiring basic physiological data for improved classification accuracy; multi-channel monitoring (EEG, ECG, heart rate [5]) and (2) Developing a visual encoder that focuses on specific facial trained technicians (Figure 1). Data-driven methods for automated features associated with OSAHS, employing attention mesh OSAHS diagnosis can improve efficiency and reduce and stochastic gates for better clinical decision alignment; (3) costs. Facial features like a flat nasal bridge, wide jawbone, Implementing a data pre-processing strategy to handle imbalanced thick neck, and mandibular retrognathia correlate with OSAHS samples and ordinal classification, using randomOver-severity [6], providing visual indicators of airway obstruction Sampler (ROS) [17] and an ordinal regression loss function and sleep disturbances. Deep learning can analyze these features [18] to enhance accuracy and robustness; (4) Demonstrating for early diagnosis and personalized treatment.


Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries

Battula, Harshavardhan, Liu, Jiacheng, Srivastava, Jaideep

arXiv.org Artificial Intelligence

In-hospital mortality (IHM) prediction for ICU patients is critical for timely interventions and efficient resource allocation. While structured physiological data provides quantitative insights, clinical notes offer unstructured, context-rich narratives. This study integrates these modalities with Large Language Model (LLM)-generated expert summaries to improve IHM prediction accuracy. Using the MIMIC-III database, we analyzed time-series physiological data and clinical notes from the first 48 hours of ICU admission. Clinical notes were concatenated chronologically for each patient and transformed into expert summaries using Med42-v2 70B. A multi-representational learning framework was developed to integrate these data sources, leveraging LLMs to enhance textual data while mitigating direct reliance on LLM predictions, which can introduce challenges in uncertainty quantification and interpretability. The proposed model achieved an AUPRC of 0.6156 (+36.41%) and an AUROC of 0.8955 (+7.64%) compared to a time-series-only baseline. Expert summaries outperformed clinical notes or time-series data alone, demonstrating the value of LLM-generated knowledge. Performance gains were consistent across demographic groups, with notable improvements in underrepresented populations, underscoring the framework's equitable application potential. By integrating LLM-generated summaries with structured and unstructured data, the framework captures complementary patient information, significantly improving predictive performance. This approach showcases the potential of LLMs to augment critical care prediction models, emphasizing the need for domain-specific validation and advanced integration strategies for broader clinical adoption.